Distributed Ranked Search
نویسندگان
چکیده
P2P deployments are a natural infrastructure for building distributed search networks. Proposed systems support locating and retrieving all results, but lack the information necessary to rank them. Users, however, are primarily interested in the most relevant results, not necessarily all possible results. Using random sampling, we extend a class of well-known information retrieval ranking algorithms such that they can be applied in this decentralized setting. We analyze the overhead of our approach, and quantify how our system scales with increasing number of documents, system size, document to node mapping (uniform versus non-uniform), and types of queries (rare versus popular terms). Our analysis and simulations show that a) these extensions are efficient, and scale with little overhead to large systems, and b) the accuracy of the results obtained using distributed ranking is comparable to that of a centralized implementation.
منابع مشابه
Fuzzy retrieval of encrypted data by multi-purpose data-structures
The growing amount of information that has arisen from emerging technologies has caused organizations to face challenges in maintaining and managing their information. Expanding hardware, human resources, outsourcing data management, and maintenance an external organization in the form of cloud storage services, are two common approaches to overcome these challenges; The first approach costs of...
متن کاملPanta Rhei: Optimized and Ranked Data Processing over Heterogeneous Sources
In the era of digital information, the value of data resides not only in its volume and quality, but also in the additional information that can be inferred from the combination (aggregation, comparison and join) of such data. There is a concrete need for data processing solutions that combine distributed and heterogeneous data sources, such as Web services, relational databases, and even searc...
متن کاملClustering and Ranked Search for Enterprise Content Management
The aim of this work is to understand more closely where the border lies between relational and Not Only Structured Query Language (NoSQL) platform as concerns Enterprise Content Management (ECM) area. Another objective (closely related to the first one) is to specify the conceptual architecture of the distributed ECM system. The authors specify the model of the prototype ECM system and compare...
متن کاملApplication of Tabu Search to Optimal Placement of Distributed Generation and Reactive Power Sources
Introducing distributed generation into a power system can lead to numerous benefits including technical, economic, environmental, etc. To attain these benefits, distributed generators with proper rating should be installed at suitable locations. Given the similar effects of distributed generators and capacitor banks on operation indices of a distribution system, simultaneous assignment of best...
متن کاملA Distributed Weighted Proceedings JENC
This paper describes the WHERE system, an approach to a distributed indexing service for document search on the Internet based upon an architecture of centroids. Numerical data, produced with the aid of Information Retrieval techniques, in the form of a weighing measure, are added to the Whois++ centroids enabling ranked results to be delivered to clients, which they use not only for presenting...
متن کاملApplication of Tabu Search to Optimal Placement of Distributed Generation and Reactive Power Sources
Introducing distributed generation into a power system can lead to numerous benefits including technical, economic, environmental, etc. To attain these benefits, distributed generators with proper rating should be installed at suitable locations. Given the similar effects of distributed generators and capacitor banks on operation indices of a distribution system, simultaneous assignment of best...
متن کامل